506 research outputs found
Self-Verification Improves Few-Shot Clinical Information Extraction
Extracting patient information from unstructured text is a critical task in
health decision-support and clinical research. Large language models (LLMs)
have shown the potential to accelerate clinical curation via few-shot
in-context learning, in contrast to supervised learning which requires much
more costly human annotations. However, despite drastic advances in modern LLMs
such as GPT-4, they still struggle with issues regarding accuracy and
interpretability, especially in mission-critical domains such as health. Here,
we explore a general mitigation framework using self-verification, which
leverages the LLM to provide provenance for its own extraction and check its
own outputs. This is made possible by the asymmetry between verification and
generation, where the latter is often much easier than the former. Experimental
results show that our method consistently improves accuracy for various LLMs in
standard clinical information extraction tasks. Additionally, self-verification
yields interpretations in the form of a short text span corresponding to each
output, which makes it very efficient for human experts to audit the results,
paving the way towards trustworthy extraction of clinical information in
resource-constrained scenarios. To facilitate future research in this
direction, we release our code and prompts
An Investigation into the Effects of Pre-training Data Distributions for Pathology Report Classification
Pre-trained transformer models have demonstrated success across many natural
language processing (NLP) tasks. In applying these models to the clinical
domain, a prevailing assumption is that pre-training language models from
scratch on large-scale biomedical data results in substantial improvements. We
test this assumption with 4 pathology classification tasks on a corpus of 2907
prostate cancer pathology reports. We evaluate 5 transformer pre-trained models
that are the same size but differ in pre-training corpora. Specifically, we
analyze 3 categories of models: 1)General-domain: BERT and Turing Natural
Language Representation (TNLR) models, which use general corpora for
pre-training, 2)Mixed-domain: BioBERT which is obtained from BERT by including
PubMed abstracts in pre-training and Clinical BioBERT which additionally
includes MIMIC-III clinical notes and 3)Domain-specific: PubMedBERT which is
pre-trained from scratch on PubMed abstracts. We find the mixed-domain and
domain-specific models exhibit faster feature disambiguation during
fine-tuning. However, the domain-specific model, PubMedBERT, can overfit to
minority classes when presented with class imbalance, a common scenario in
pathology report data. At the same time, the mixed-domain models are more
resistant to overfitting. Our findings indicate that the use of general natural
language and domain-specific corpora in pre-training serve complementary
purposes for pathology report classification. The first enables resistance to
overfitting when fine-tuning on an imbalanced dataset while the second allows
for more accurate modelling of the fine-tuning domain. An expert evaluation is
also conducted to reveal common outlier modes of each model. Our results could
inform better fine-tuning practices in the clinical domain, to possibly
leverage the benefits of mixed-domain models for imbalanced downstream
datasets
Predicting early psychiatric readmission with natural language processing of narrative discharge summaries
The ability to predict psychiatric readmission would facilitate the development of interventions to reduce this risk, a major driver of psychiatric health-care costs. The symptoms or characteristics of illness course necessary to develop reliable predictors are not available in coded billing data, but may be present in narrative electronic health record (EHR) discharge summaries. We identified a cohort of individuals admitted to a psychiatric inpatient unit between 1994 and 2012 with a principal diagnosis of major depressive disorder, and extracted inpatient psychiatric discharge narrative notes. Using these data, we trained a 75-topic Latent Dirichlet Allocation (LDA) model, a form of natural language processing, which identifies groups of words associated with topics discussed in a document collection. The cohort was randomly split to derive a training (70%) and testing (30%) data set, and we trained separate support vector machine models for baseline clinical features alone, baseline features plus common individual words and the above plus topics identified from the 75-topic LDA model. Of 4687 patients with inpatient discharge summaries, 470 were readmitted within 30 days. The 75-topic LDA model included topics linked to psychiatric symptoms (suicide, severe depression, anxiety, trauma, eating/weight and panic) and major depressive disorder comorbidities (infection, postpartum, brain tumor, diarrhea and pulmonary disease). By including LDA topics, prediction of readmission, as measured by area under receiver-operating characteristic curves in the testing data set, was improved from baseline (area under the curve 0.618) to baseline+1000 words (0.682) to baseline+75 topics (0.784). Inclusion of topics derived from narrative notes allows more accurate discrimination of individuals at high risk for psychiatric readmission in this cohort. Topic modeling and related approaches offer the potential to improve prediction using EHRs, if generalizability can be established in other clinical cohorts
- …